\chapter{Reliability Methods}\label{uq:reliability}

%This chapter explores local and global reliability methods in greater
%detail than that provided in the Uncertainty Quantification chapter of
%the User's Manual.


\section{Local Reliability Methods}\label{uq:reliability:local}

Local reliability methods include the Mean Value method and the family
of most probable point (MPP) search methods.  Each of these methods is
gradient-based, employing local approximations and/or local
optimization methods.


\subsection{Mean Value}\label{uq:reliability:local:mv}

The Mean Value method (MV, also known as MVFOSM in \cite{Hal00}) is
the simplest, least-expensive reliability method because it estimates
the response means, response standard deviations, and all CDF/CCDF
response-probability-reliability levels from a single evaluation of
response functions and their gradients at the uncertain variable
means.  This approximation can have acceptable accuracy when the
response functions are nearly linear and their distributions are
approximately Gaussian, but can have poor accuracy in other
situations.  The expressions for approximate response mean $\mu_g$,
approximate response variance $\sigma^2_g$, response target to
approximate probability/reliability level mapping ($\bar{z} \to p,\beta$),
and probability/reliability target to approximate response level mapping
($\bar{p},\bar{\beta} \to z$) are
\begin{eqnarray}
\mu_g      & = & g(\mu_{\bf x})  \label{eq:mv_mean1} \\
\sigma^2_g & = & \sum_i \sum_j Cov(i,j) \frac{dg}{dx_i}(\mu_{\bf x})
                 \frac{dg}{dx_j}(\mu_{\bf x}) \label{eq:mv_std_dev} \\
\bar{z} \rightarrow \beta: & ~ & 
\beta_{\rm CDF} = \frac{\mu_g - \bar{z}}{\sigma_g}, ~~~~~
\beta_{\rm CCDF} = \frac{\bar{z} - \mu_g}{\sigma_g} \label{eq:mv_ria} \\
\bar{\beta} \rightarrow z: & ~ & 
z = \mu_g - \sigma_g \bar{\beta}_{\rm CDF}, ~~~~~
z = \mu_g + \sigma_g \bar{\beta}_{\rm CCDF} \label{eq:mv_pma}
\end{eqnarray}
respectively, where ${\bf x}$ are the uncertain values in the 
space of the original uncertain variables (``x-space''), $g({\bf x})$
is the limit state function (the response function for which
probability-response level pairs are needed), and $\beta_{\rm CDF}$ and
$\beta_{\rm CCDF}$ are the CDF and CCDF reliability indices, respectively.

With the introduction of second-order limit state information, MVSOSM
calculates a second-order mean as
\begin{equation}
\mu_g = g(\mu_{\bf x}) + \frac{1}{2} \sum_i \sum_j Cov(i,j) 
\frac{d^2g}{dx_i dx_j}(\mu_{\bf x}) \label{eq:mv_mean2}
\end{equation}

This is commonly combined with a first-order variance
(Equation~\ref{eq:mv_std_dev}), since second-order variance involves
higher order distribution moments (skewness, kurtosis)~\cite{Hal00}
which are often unavailable.

The first-order CDF probability $p(g \le z)$, first-order 
CCDF probability $p(g > z)$, $\beta_{\rm CDF}$, and $\beta_{\rm CCDF}$ are
related to one another through
\begin{eqnarray}
p(g \le z)  & = & \Phi(-\beta_{\rm CDF})     \label{eq:p_cdf} \\
p(g > z)    & = & \Phi(-\beta_{\rm CCDF})    \label{eq:p_ccdf} \\
\beta_{\rm CDF}  & = & -\Phi^{-1}(p(g \le z)) \label{eq:beta_cdf} \\
\beta_{\rm CCDF} & = & -\Phi^{-1}(p(g > z))   \label{eq:beta_ccdf} \\
\beta_{\rm CDF}  & = & -\beta_{\rm CCDF}       \label{eq:beta_cdf_ccdf} \\
p(g \le z)  & = & 1 - p(g > z)             \label{eq:p_cdf_ccdf}
\end{eqnarray}
where $\Phi()$ is the standard normal cumulative distribution
function.  A common convention in the literature is to define $g$ in
such a way that the CDF probability for a response level $z$ of zero
(i.e., $p(g \le 0)$) is the response metric of interest.  Dakota is
not restricted to this convention and is designed to support CDF or
CCDF mappings for general response, probability, and reliability level
sequences.

With the Mean Value method, it is possible to obtain 
importance factors indicating the relative importance of 
input variables.  The importance factors can be viewed
as an extension of linear sensitivity analysis combining deterministic
gradient information with input uncertainty information,
\emph{i.e}. input variable standard deviations. The accuracy of the
importance factors is contingent of the validity of the linear
approximation used to approximate the true response functions.
The importance factors are determined as: 

\begin{equation}
ImpFactor_i  = ({\frac{\sigma_{x_{i}}}{\sigma_g}}{\frac{dg}{dx_i}(\mu_{\bf x})})^2
\end{equation}


\subsection{MPP Search Methods}\label{uq:reliability:local:mpp}

All other local reliability methods solve an equality-constrained nonlinear
optimization problem to compute a most probable point (MPP) and then
integrate about this point to compute probabilities.  The MPP search
is performed in uncorrelated standard normal space (``u-space'') since
it simplifies the probability integration: the distance of the MPP
from the origin has the meaning of the number of input standard
deviations separating the mean response from a particular response
threshold.  The transformation from correlated non-normal
distributions (x-space) to uncorrelated standard normal distributions
(u-space) is denoted as ${\bf u} = T({\bf x})$ with the reverse
transformation denoted as ${\bf x} = T^{-1}({\bf u})$.  These
transformations are nonlinear in general, and possible approaches
include the Rosenblatt~\cite{Ros52}, Nataf~\cite{Der86}, and
Box-Cox~\cite{Box64} transformations.  The nonlinear transformations
may also be linearized, and common approaches for this include the
Rackwitz-Fiessler~\cite{Rac78} two-parameter equivalent normal and the
Chen-Lind~\cite{Che83} and Wu-Wirsching~\cite{Wu87} three-parameter
equivalent normals.  Dakota employs the Nataf nonlinear transformation
which is suitable for the common case when marginal distributions and
a correlation matrix are provided, but full joint distributions are
not known\footnote{If joint distributions are known, then the
Rosenblatt transformation is preferred.}.  This transformation occurs 
in the following two steps.  To transform between the
original correlated x-space variables and correlated standard normals
(``z-space''), a CDF matching condition is applied for each of the
marginal distributions:
\begin{equation}
\Phi(z_i) = F(x_i) \label{eq:trans_zx}
\end{equation}
where $F()$ is the cumulative distribution function of the original
probability distribution.  Then, to transform between correlated
z-space variables and uncorrelated u-space variables, the Cholesky 
factor ${\bf L}$ of a modified correlation matrix is used:
\begin{equation}
{\bf z} = {\bf L} {\bf u} \label{eq:trans_zu}
\end{equation}
where the original correlation matrix for non-normals in x-space has
been modified to represent the corresponding ``warped'' correlation in 
z-space~\cite{Der86}.

The forward reliability analysis algorithm of computing CDF/CCDF
probability/reliability levels for specified response levels is called
the reliability index approach (RIA), and the inverse reliability
analysis algorithm of computing response levels for specified CDF/CCDF
probability/reliability levels is called the performance measure
approach (PMA)~\cite{Tu99}.  The differences between the RIA and PMA
formulations appear in the objective function and equality constraint
formulations used in the MPP searches.  For RIA, the MPP search for
achieving the specified response level $\bar{z}$ is formulated as
computing the minimum distance in u-space from the origin to the
$\bar{z}$ contour of the limit state response function:
\begin{eqnarray}
{\rm minimize}     & {\bf u}^T {\bf u} \nonumber \\
{\rm subject \ to} & G({\bf u}) = \bar{z} \label{eq:ria_opt}
\end{eqnarray}
where ${\bf u}$ is a vector centered at the origin in u-space and
$g({\bf x}) \equiv G({\bf u})$ by definition.  For PMA, the MPP search
for achieving the specified reliability level $\bar{\beta}$ or
first-order probability level $\bar{p}$ is formulated as computing the
minimum/maximum response function value corresponding to a prescribed
distance from the origin in u-space:
\begin{eqnarray}
{\rm minimize}     & \pm G({\bf u}) \nonumber \\
{\rm subject \ to} & {\bf u}^T {\bf u} = \bar{\beta}^2 \label{eq:pma_opt}
\end{eqnarray}
where $\bar{\beta}$ is computed from $\bar{p}$ using
Eq.~\ref{eq:beta_cdf} or~\ref{eq:beta_ccdf} in the latter case of a
prescribed first-order probability level.  For a specified generalized
reliability level $\bar{\beta^*}$ or second-order probability level
$\bar{p}$, the equality constraint is reformulated in terms of the
generalized reliability index:
\begin{eqnarray}
{\rm minimize}     & \pm G({\bf u}) \nonumber \\
{\rm subject \ to} & \beta^*({\bf u}) = \bar{\beta^*} \label{eq:pma2_opt}
\end{eqnarray}
where $\bar{\beta^*}$ is computed from $\bar{p}$ using
Eq.~\ref{eq:gen_beta} (or its CCDF complement) in the latter case of a
prescribed second-order probability level.

In the RIA case, the optimal MPP solution ${\bf u}^*$ defines the
reliability index from $\beta = \pm \|{\bf u}^*\|_2$, which in turn
defines the CDF/CCDF probabilities (using
Equations~\ref{eq:p_cdf}-\ref{eq:p_ccdf} in the case of first-order
integration).  The sign of $\beta$ is defined by
\begin{eqnarray}
G({\bf u}^*) > G({\bf 0}): \beta_{\rm CDF} < 0, \beta_{\rm CCDF} > 0 \\
G({\bf u}^*) < G({\bf 0}): \beta_{\rm CDF} > 0, \beta_{\rm CCDF} < 0
\end{eqnarray}
\noindent where $G({\bf 0})$ is the median limit state response computed 
at the origin in u-space\footnote{It is not necessary to explicitly compute
the median response since the sign of the inner product 
$\langle {\bf u}^*, \nabla_{\bf u} G \rangle$
can be used to determine the orientation of the optimal response with 
respect to the median response.} (where $\beta_{\rm CDF}$ = $\beta_{\rm CCDF}$ = 0 
and first-order $p(g \le z)$ = $p(g > z)$ = 0.5).  In the PMA case, the 
sign applied to $G({\bf u})$ (equivalent to minimizing or maximizing 
$G({\bf u})$) is similarly defined by either $\bar{\beta}$ (for a specified 
reliability or first-order probability level) or from a $\bar{\beta}$ 
estimate\footnote{computed by inverting the second-order probability 
relationships described in Section~\ref{uq:reliability:local:mpp:int} at 
the current ${\bf u}^*$ iterate.} computed from $\bar{\beta^*}$ (for a 
specified generalized reliability or second-order probability level)
\begin{eqnarray}
\bar{\beta}_{\rm CDF} < 0, \bar{\beta}_{\rm CCDF} > 0: {\rm maximize \ } G({\bf u}) \\
\bar{\beta}_{\rm CDF} > 0, \bar{\beta}_{\rm CCDF} < 0: {\rm minimize \ } G({\bf u})
\end{eqnarray}
where the limit state at the MPP ($G({\bf u}^*)$) defines the desired
response level result.

\subsubsection{Limit state approximations} \label{uq:reliability:local:mpp:approx}

There are a variety of algorithmic variations that are available for
use within RIA/PMA reliability analyses.  First, one may select among
several different limit state approximations that can be used to
reduce computational expense during the MPP searches.  Local,
multipoint, and global approximations of the limit state are possible.
\cite{Eld05} investigated local first-order limit state 
approximations, and \cite{Eld06a} investigated local second-order
and multipoint approximations.  These techniques include:

\begin{enumerate}
\item a single Taylor series per response/reliability/probability level 
in x-space centered at the uncertain variable means.  The first-order 
approach is commonly known as the Advanced Mean Value (AMV) method:
\begin{equation}
g({\bf x}) \cong g(\mu_{\bf x}) + \nabla_x g(\mu_{\bf x})^T 
({\bf x} - \mu_{\bf x}) \label{eq:linear_x_mean}
\end{equation}
and the second-order approach has been named AMV$^2$:
\begin{equation}
g({\bf x}) \cong g(\mu_{\bf x}) + \nabla_{\bf x} g(\mu_{\bf x})^T 
({\bf x} - \mu_{\bf x}) + \frac{1}{2} ({\bf x} - \mu_{\bf x})^T 
\nabla^2_{\bf x} g(\mu_{\bf x}) ({\bf x} - \mu_{\bf x})
\label{eq:taylor2_x_mean}
\end{equation}

\item same as AMV/AMV$^2$, except that the Taylor series is expanded 
in u-space.  The first-order option has been termed the u-space AMV 
method:
\begin{equation}
G({\bf u}) \cong G(\mu_{\bf u}) + \nabla_u G(\mu_{\bf u})^T 
({\bf u} - \mu_{\bf u}) \label{eq:linear_u_mean}
\end{equation}
where $\mu_{\bf u} = T(\mu_{\bf x})$ and is nonzero in general, and 
the second-order option has been named the u-space AMV$^2$ method:
\begin{equation}
G({\bf u}) \cong G(\mu_{\bf u}) + \nabla_{\bf u} G(\mu_{\bf u})^T 
({\bf u} - \mu_{\bf u}) + \frac{1}{2} ({\bf u} - \mu_{\bf u})^T 
\nabla^2_{\bf u} G(\mu_{\bf u}) ({\bf u} - \mu_{\bf u}) 
\label{eq:taylor2_u_mean}
\end{equation}

\item an initial Taylor series approximation in x-space at the uncertain 
variable means, with iterative expansion updates at each MPP estimate
(${\bf x}^*$) until the MPP converges.  The first-order option is
commonly known as AMV+:
\begin{equation}
g({\bf x}) \cong g({\bf x}^*) + \nabla_x g({\bf x}^*)^T ({\bf x} - {\bf x}^*)
\label{eq:linear_x_mpp}
\end{equation}
and the second-order option has been named AMV$^2$+:
\begin{equation}
g({\bf x}) \cong g({\bf x}^*) + \nabla_{\bf x} g({\bf x}^*)^T 
({\bf x} - {\bf x}^*) + \frac{1}{2} ({\bf x} - {\bf x}^*)^T 
\nabla^2_{\bf x} g({\bf x}^*) ({\bf x} - {\bf x}^*) \label{eq:taylor2_x_mpp}
\end{equation}

\item same as AMV+/AMV$^2$+, except that the expansions are performed in 
u-space.  The first-order option has been termed the u-space AMV+ method.
\begin{equation}
G({\bf u}) \cong G({\bf u}^*) + \nabla_u G({\bf u}^*)^T ({\bf u} - {\bf u}^*)
\label{eq:linear_u_mpp}
\end{equation}
and the second-order option has been named the u-space AMV$^2$+ method:
\begin{equation}
G({\bf u}) \cong G({\bf u}^*) + \nabla_{\bf u} G({\bf u}^*)^T 
({\bf u} - {\bf u}^*) + \frac{1}{2} ({\bf u} - {\bf u}^*)^T 
\nabla^2_{\bf u} G({\bf u}^*) ({\bf u} - {\bf u}^*) \label{eq:taylor2_u_mpp}
\end{equation}

\item a multipoint approximation in x-space. This approach involves a 
Taylor series approximation in intermediate variables where the powers
used for the intermediate variables are selected to match information at
the current and previous expansion points.  Based on the 
two-point exponential approximation concept (TPEA, \cite{Fad90}), the 
two-point adaptive nonlinearity approximation (TANA-3, \cite{Xu98})
approximates the limit state as:
\begin{equation}
g({\bf x}) \cong g({\bf x}_2) + \sum_{i=1}^n 
\frac{\partial g}{\partial x_i}({\bf x}_2) \frac{x_{i,2}^{1-p_i}}{p_i} 
(x_i^{p_i} - x_{i,2}^{p_i}) + \frac{1}{2} \epsilon({\bf x}) \sum_{i=1}^n 
(x_i^{p_i} - x_{i,2}^{p_i})^2 \label{eq:tana_g}
\end{equation}

\noindent where $n$ is the number of uncertain variables and:
\begin{eqnarray}
p_i & = & 1 + \ln \left[ \frac{\frac{\partial g}{\partial x_i}({\bf x}_1)}
{\frac{\partial g}{\partial x_i}({\bf x}_2)} \right] \left/ 
\ln \left[ \frac{x_{i,1}}{x_{i,2}} \right] \right. \label{eq:tana_pi_x} \\
\epsilon({\bf x}) & = & \frac{H}{\sum_{i=1}^n (x_i^{p_i} - x_{i,1}^{p_i})^2 + 
\sum_{i=1}^n (x_i^{p_i} - x_{i,2}^{p_i})^2} \label{eq:tana_eps_x} \\
H & = & 2 \left[ g({\bf x}_1) - g({\bf x}_2) - \sum_{i=1}^n 
\frac{\partial g}{\partial x_i}({\bf x}_2) \frac{x_{i,2}^{1-p_i}}{p_i} 
(x_{i,1}^{p_i} - x_{i,2}^{p_i}) \right] \label{eq:tana_H_x}
\end{eqnarray}
\noindent and ${\bf x}_2$ and ${\bf x}_1$ are the current and previous
MPP estimates in x-space, respectively.  Prior to the availability of
two MPP estimates, x-space AMV+ is used.

\item a multipoint approximation in u-space. The u-space TANA-3
approximates the limit state as:
\begin{equation}
G({\bf u}) \cong G({\bf u}_2) + \sum_{i=1}^n 
\frac{\partial G}{\partial u_i}({\bf u}_2) \frac{u_{i,2}^{1-p_i}}{p_i} 
(u_i^{p_i} - u_{i,2}^{p_i}) + \frac{1}{2} \epsilon({\bf u}) \sum_{i=1}^n 
(u_i^{p_i} - u_{i,2}^{p_i})^2 \label{eq:tana_G}
\end{equation}

\noindent where:
\begin{eqnarray}
p_i & = & 1 + \ln \left[ \frac{\frac{\partial G}{\partial u_i}({\bf u}_1)}
{\frac{\partial G}{\partial u_i}({\bf u}_2)} \right] \left/ 
\ln \left[ \frac{u_{i,1}}{u_{i,2}} \right] \right. \label{eq:tana_pi_u} \\
\epsilon({\bf u}) & = & \frac{H}{\sum_{i=1}^n (u_i^{p_i} - u_{i,1}^{p_i})^2 + 
\sum_{i=1}^n (u_i^{p_i} - u_{i,2}^{p_i})^2} \label{eq:tana_eps_u} \\
H & = & 2 \left[ G({\bf u}_1) - G({\bf u}_2) - \sum_{i=1}^n 
\frac{\partial G}{\partial u_i}({\bf u}_2) \frac{u_{i,2}^{1-p_i}}{p_i} 
(u_{i,1}^{p_i} - u_{i,2}^{p_i}) \right] \label{eq:tana_H_u}
\end{eqnarray}
\noindent and ${\bf u}_2$ and ${\bf u}_1$ are the current and previous
MPP estimates in u-space, respectively.  Prior to the availability of
two MPP estimates, u-space AMV+ is used.

\item the MPP search on the original response functions without the 
use of any approximations.  Combining this option with first-order and
second-order integration approaches (see next section) results in the
traditional first-order and second-order reliability methods (FORM and
SORM).
\end{enumerate}

The Hessian matrices in AMV$^2$ and AMV$^2$+ may be available
analytically, estimated numerically, or approximated through
quasi-Newton updates.  The selection between x-space or u-space for
performing approximations depends on where the approximation will be
more accurate, since this will result in more accurate MPP estimates
(AMV, AMV$^2$) or faster convergence (AMV+, AMV$^2$+, TANA).  Since
this relative accuracy depends on the forms of the limit state $g(x)$
and the transformation $T(x)$ and is therefore application dependent
in general, Dakota supports both options.  A concern with
approximation-based iterative search methods (i.e., AMV+, AMV$^2$+ and
TANA) is the robustness of their convergence to the MPP.  It is
possible for the MPP iterates to oscillate or even diverge.  However,
to date, this occurrence has been relatively rare, and Dakota contains
checks that monitor for this behavior.  Another concern with TANA is
numerical safeguarding (e.g., the possibility of raising negative
$x_i$ or $u_i$ values to nonintegral $p_i$ exponents in
Equations~\ref{eq:tana_g}, \ref{eq:tana_eps_x}-\ref{eq:tana_G},
and~\ref{eq:tana_eps_u}-\ref{eq:tana_H_u}).  Safeguarding involves
offseting negative $x_i$ or $u_i$ and, for potential numerical
difficulties with the logarithm ratios in Equations~\ref{eq:tana_pi_x}
and~\ref{eq:tana_pi_u}, reverting to either the linear ($p_i = 1$) or
reciprocal ($p_i = -1$) approximation based on which approximation has
lower error in $\frac{\partial g}{\partial x_i}({\bf x}_1)$ or
$\frac{\partial G}{\partial u_i}({\bf u}_1)$.

\subsubsection{Probability integrations} \label{uq:reliability:local:mpp:int}

The second algorithmic variation involves the integration approach for
computing probabilities at the MPP, which can be selected to be
first-order (Equations~\ref{eq:p_cdf}-\ref{eq:p_ccdf}) or second-order
integration.  Second-order integration involves applying a curvature
correction~\cite{Bre84,Hoh88,Hon99}.  Breitung applies a correction
based on asymptotic analysis~\cite{Bre84}:
\begin{equation}
p = \Phi(-\beta_p) \prod_{i=1}^{n-1} \frac{1}{\sqrt{1 + \beta_p \kappa_i}}
\label{eq:p_2nd_breit}
\end{equation}
where $\kappa_i$ are the principal curvatures of the limit state
function (the eigenvalues of an orthonormal transformation of
$\nabla^2_{\bf u} G$, taken positive for a convex limit state) and
$\beta_p \ge 0$ (a CDF or CCDF probability correction is selected to
obtain the correct sign for $\beta_p$).  An alternate correction in
\cite{Hoh88} is consistent in the asymptotic regime ($\beta_p \to \infty$) 
but does not collapse to first-order integration for $\beta_p = 0$:
\begin{equation}
p = \Phi(-\beta_p) \prod_{i=1}^{n-1} 
\frac{1}{\sqrt{1 + \psi(-\beta_p) \kappa_i}} \label{eq:p_2nd_hr}
\end{equation}
where $\psi() = \frac{\phi()}{\Phi()}$ and $\phi()$ is the standard
normal density function.  \cite{Hon99} applies further corrections to
Equation~\ref{eq:p_2nd_hr} based on point concentration methods.  At
this time, all three approaches are available within the code, but the
Hohenbichler-Rackwitz correction is used by default (switching the 
correction is a compile-time option in the source code and has not
been exposed in the input specification).

\subsubsection{Hessian approximations} \label{sec:hessian}

To use a second-order Taylor series or a second-order integration when
second-order information ($\nabla^2_{\bf x} g$, $\nabla^2_{\bf u} G$,
and/or $\kappa$) is not directly available, one can estimate the
missing information using finite differences or approximate it through
use of quasi-Newton approximations.  These procedures will often be
needed to make second-order approaches practical for engineering
applications.

In the finite difference case, numerical Hessians are commonly computed
using either first-order forward differences of gradients using
\begin{equation}
\nabla^2 g ({\bf x}) \cong 
\frac{\nabla g ({\bf x} + h {\bf e}_i) - \nabla g ({\bf x})}{h}
\end{equation}
to estimate the $i^{th}$ Hessian column when gradients are analytically 
available, or second-order differences of function values using
\begin{equation}
\begin{array}{l}
\nabla^2 g ({\bf x}) \cong \frac{g({\bf x} + h {\bf e}_i + h {\bf e}_j) - 
g({\bf x} + h {\bf e}_i - h {\bf e}_j) - 
g({\bf x} - h {\bf e}_i + h {\bf e}_j) + 
g({\bf x} - h {\bf e}_i - h {\bf e}_j)}{4h^2}
\end{array}
\end{equation}
to estimate the $ij^{th}$ Hessian term when gradients are not directly
available.  This approach has the advantage of locally-accurate
Hessians for each point of interest (which can lead to quadratic
convergence rates in discrete Newton methods), but has the
disadvantage that numerically estimating each of the matrix terms can
be expensive.

Quasi-Newton approximations, on the other hand, do not reevaluate all
of the second-order information for every point of interest.  Rather,
they accumulate approximate curvature information over time using
secant updates.  Since they utilize the existing gradient evaluations,
they do not require any additional function evaluations for evaluating
the Hessian terms.  The quasi-Newton approximations of interest
include the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update
\begin{equation}
{\bf B}_{k+1} = {\bf B}_{k} - \frac{{\bf B}_k {\bf s}_k {\bf s}_k^T {\bf B}_k}
{{\bf s}_k^T {\bf B}_k {\bf s}_k} + 
\frac{{\bf y}_k {\bf y}_k^T}{{\bf y}_k^T {\bf s}_k} \label{eq:bfgs}
\end{equation}
which yields a sequence of symmetric positive definite Hessian
approximations, and the Symmetric Rank 1 (SR1) update
\begin{equation}
{\bf B}_{k+1} = {\bf B}_{k} + 
\frac{({\bf y}_k - {\bf B}_k {\bf s}_k)({\bf y}_k - {\bf B}_k {\bf s}_k)^T}
{({\bf y}_k - {\bf B}_k {\bf s}_k)^T {\bf s}_k} \label{eq:sr1}
\end{equation}
which yields a sequence of symmetric, potentially indefinite, Hessian 
approximations.  ${\bf B}_k$ is the $k^{th}$ approximation to
the Hessian $\nabla^2 g$, ${\bf s}_k = {\bf x}_{k+1} - {\bf x}_k$ is
the step and ${\bf y}_k = \nabla g_{k+1} - \nabla g_k$ is the
corresponding yield in the gradients.  The selection of BFGS versus SR1
involves the importance of retaining positive definiteness in the
Hessian approximations; if the procedure does not require it, then
the SR1 update can be more accurate if the true Hessian is not positive 
definite.  Initial scalings for ${\bf B}_0$ and numerical safeguarding 
techniques (damped BFGS, update skipping) are described in \cite{Eld06a}.


\subsubsection{Optimization algorithms}

The next algorithmic variation involves the optimization algorithm
selection for solving Eqs.~\ref{eq:ria_opt} and~\ref{eq:pma_opt}.  The
Hasofer-Lind Rackwitz-Fissler (HL-RF) algorithm~\cite{Hal00} is a
classical approach that has been broadly applied.  It is a
Newton-based approach lacking line search/trust region globalization,
and is generally regarded as computationally efficient but
occasionally unreliable.  Dakota takes the approach of employing
robust, general-purpose optimization algorithms with provable
convergence properties.  In particular, we employ the sequential
quadratic programming (SQP) and nonlinear interior-point (NIP)
optimization algorithms from the NPSOL~\cite{Gil86} and
OPT++~\cite{MeOlHoWi07} libraries, respectively.


\subsubsection{Warm Starting of MPP Searches}  

The final algorithmic variation for local reliability methods involves
the use of warm starting approaches for improving computational
efficiency.  \cite{Eld05} describes the acceleration of MPP
searches through warm starting with approximate iteration increment,
with $z/p/\beta$ level increment, and with design variable increment.
Warm started data includes the expansion point and associated response
values and the MPP optimizer initial guess.  Projections are used when
an increment in $z/p/\beta$ level or design variables occurs.  Warm
starts were consistently effective in \cite{Eld05}, with greater
effectiveness for smaller parameter changes, and are used by default
in Dakota. %for all computational experiments presented in this paper.


\section{Global Reliability Methods}\label{uq:reliability:global}

Local reliability methods, while computationally efficient, have
well-known failure mechanisms.  When confronted with a limit state
function that is nonsmooth, local gradient-based optimizers may stall
due to gradient inaccuracy and fail to converge to an MPP.  Moreover,
if the limit state is multimodal (multiple MPPs), then a
gradient-based local method can, at best, locate only one local MPP
solution.  Finally, a linear (Eqs.~\ref{eq:p_cdf}--\ref{eq:p_ccdf}) or
parabolic (Eqs.~\ref{eq:p_2nd_breit}--\ref{eq:p_2nd_hr}) approximation
to the limit state at this MPP may fail to adequately capture the
contour of a highly nonlinear limit state.  %For these reasons,
%efficient global reliability analysis (EGRA) is investigated
%in~\cite{bichon_egra_sdm}.

%Global reliability methods include the efficient global reliability
%analysis (EGRA) method. Analytical methods of reliability analysis solve a 
%local optimization problem to locate the most probable point of failure (MPP), 
%and then quantify the reliability based on its location and an approximation 
%to the shape of the limit state at this point. Typically, gradient-based 
%solvers are used to solve this optimization problem, which may fail to 
%converge for nonsmooth response functions with unreliable gradients or 
%may converge to only one of several solutions for response functions that 
%possess multiple local optima. In addition to these MPP convergence issues, 
%the evaluated probabilites can be adversely affected by limit state 
%approximations that may be inaccurate. Analysts are then forced
%to revert to sampling methods, which do not rely on MPP convergence or 
%simplifying approximations to the true shape of the limit state. 
%However, employing such methods is impractical when evaluation of the 
%response function is expensive.

A reliability analysis method that is both efficient when applied to
expensive response functions and accurate for a response function of
any arbitrary shape is needed. This section develops such a method
based on efficient global optimization~\cite{Jon98} (EGO) to the
search for multiple points on or near the limit state throughout the
random variable space. By locating multiple points on the limit state,
more complex limit states can be accurately modeled, resulting in a
more accurate assessment of the reliability. It should be emphasized
here that these multiple points exist on a single limit state. Because
of its roots in efficient global optimization, this method of
reliability analysis is called efficient global reliability analysis
(EGRA)~\cite{Bic07}.  The following two subsections describe two
capabilities that are incorporated into the EGRA algorithm: importance
sampling and EGO.

\subsection{Importance Sampling}\label{uq:reliability:global:ais}

An alternative to MPP search methods is to directly 
perform the probability integration numerically by sampling the 
response function.
Sampling methods do not rely on a simplifying approximation to the shape
of the limit state, so they can be more accurate than FORM and SORM, but they
can also be prohibitively expensive because they generally require a large
number of response function evaluations.
Importance sampling methods reduce this expense by focusing the samples in
the important regions of the uncertain space.
They do this by centering the sampling density function at the MPP rather
than at the mean.
This ensures the samples will lie the region of interest,
thus increasing the efficiency of the sampling method.
Adaptive importance sampling (AIS) further improves the efficiency by
adaptively updating the sampling density function.
Multimodal adaptive importance sampling~\cite{Dey98,Zou02} is a
variation of AIS that allows for the use of multiple sampling densities
making it better suited for cases where multiple sections of the limit state
are highly probable.

Note that importance sampling methods require that the location of at 
least one MPP be known because it is used to center the initial sampling 
density. However, current gradient-based, local search methods used in 
MPP search may fail to converge or may converge to poor solutions for 
highly nonlinear problems, possibly making these methods inapplicable.
As the next section describes, EGO is a global optimization method that 
does not depend on the availability of accurate gradient information, making
convergence more reliable for nonsmooth response functions.
Moreover, EGO has the ability to locate multiple failure points, 
which would provide multiple starting points and thus a good 
multimodal sampling density for the initial steps of multimodal AIS.
The resulting Gaussian process model is accurate in the
vicinity of the limit state, thereby providing an inexpensive surrogate that
can be used to provide response function samples.
As will be seen, using EGO to locate multiple points along the limit state, 
and then using the resulting Gaussian process model to provide function 
evaluations in multimodal AIS for the probability integration, 
results in an accurate and efficient reliability analysis tool.

\subsection{Efficient Global Optimization}\label{uq:reliability:global:ego}

Efficient Global Optimization (EGO) was developed to facilitate the 
unconstrained minimization of expensive implicit response functions.
The method builds an initial Gaussian process model as a global surrogate 
for the response function, then intelligently selects additional samples 
to be added for inclusion in a new Gaussian process model in subsequent 
iterations. The new samples are selected based on how much they are expected 
to improve the current best solution to the optimization problem.
When this expected improvement is acceptably small, the globally optimal 
solution has been found. The application of this methodology to 
equality-constrained reliability analysis is the primary contribution of 
EGRA.  

Efficient global optimization was originally proposed by
Jones et al.~\cite{Jon98}~and has been adapted into similar methods
such as sequential kriging optimization (SKO)~\cite{Hua06}.
The main difference between SKO and EGO lies within the specific formulation
of what is known as the expected improvement function (EIF), which is the
feature that sets all EGO/SKO-type methods apart from other global optimization
methods.
The EIF is used to select the location at which a new training point should be
added to the Gaussian process model by maximizing the amount of improvement in 
the objective function that can be expected by adding that point.
A point could be expected to produce an improvement in the objective function
if its predicted value is better than the current best solution, or if the
uncertainty in its prediction is such that the probability of it producing
a better solution is high.
Because the uncertainty is higher in regions of the design space with fewer
observations, this provides a balance between exploiting areas of the
design space that predict good solutions, and exploring areas where more
information is needed.

The general procedure of these EGO-type methods is:
\begin{enumerate}
\item Build an initial Gaussian process model of the objective function.
%\item Use cross validation to ensure that the kriging model is satisfactory.
\item Find the point that maximizes the EIF.
      If the EIF value at this point is sufficiently small, stop.
\item Evaluate the objective function at the point where the EIF is maximized.
      Update the Gaussian process model using this new point.
      Go to Step 2.
\end{enumerate}
%\noindent To construct a parallel algorithm, the $n$ best points could be 
%selected and evaluated in steps 2 and 3.
\noindent 

The following sections discuss the construction of the Gaussian process model
used, the form of the EIF, and then a description of how that EIF is modified
for application to reliability analysis.

\subsubsection{Gaussian Process Model}\label{uq:reliability:global:ego:gpm}

Gaussian process (GP) models are set apart from other surrogate models because
they provide not just a predicted value at an unsampled point, but also and
estimate of the prediction variance.
This variance gives an indication of the uncertainty in the GP model, which
results from the construction of the covariance function. 
This function is based on the idea that when input points are near one another,
the correlation between their corresponding outputs will be high.
As a result, the uncertainty associated with the model's predictions will be
small for input points which are near the points used to train the model,
and will increase as one moves further from the training points.

It is assumed that the true response function being modeled $G({\bf u})$ can 
be described by:~\cite{Cre91}
\begin{equation}
G({\bf u})={\bf h}({\bf u})^T{\boldsymbol \beta} + Z({\bf u})
\end{equation}
\noindent where ${\bf h}()$ is the trend of the model, 
${\boldsymbol \beta}$ is the vector of trend coefficients, and
$Z()$ is a stationary Gaussian process with zero mean (and covariance defined 
below) that describes the departure of the model from its underlying trend.
The trend of the model can be assumed to be any function, but
taking it to be a constant value has been reported to be generally sufficient.~\cite{Sac89}
For the work presented here, the trend is assumed constant and
${\boldsymbol \beta}$ is taken as simply the mean of the responses at
the training points.
The covariance between outputs of the Gaussian process $Z()$ at points 
${\bf a}$ and ${\bf b}$ is defined as:
\begin{equation}
Cov \left[ Z({\bf a}),Z({\bf b}) \right] = \sigma_Z^2 R({\bf a},{\bf b})
\label{eq:cov}
\end{equation}
\noindent where $\sigma_Z^2$ is the process variance and $R()$ is the
correlation function.
There are several options for the correlation function, but the 
squared-exponential function is common~\cite{Sac89}, and is used here for $R()$:
\begin{equation}
R({\bf a},{\bf b}) = \exp \left[ -\sum_{i=1}^d \theta_i(a_i - b_i)^2 \right]
\end{equation}
\noindent where $d$ represents the dimensionality of the problem
(the number of random variables), and 
$\theta_i$ is a scale parameter that indicates the correlation between the 
points within dimension $i$.
A large $\theta_i$ is representative of a short correlation length.

The expected value $\mu_G()$ and variance $\sigma_G^2()$ of the GP model 
prediction at point ${\bf u}$ are:
\begin{align}
\mu_G({\bf u}) &= {\bf h}({\bf u})^T{\boldsymbol \beta} + 
  {\bf r}({\bf u})^T{\bf R}^{-1}({\bf g} - {\bf F}{\boldsymbol \beta}) 
  \label{eq:exp} \\
\sigma_G^2({\bf u}) &= \sigma_Z^2 - 
  \begin{bmatrix} {\bf h}({\bf u})^T  & 
                  {\bf r}({\bf u})^T  \end{bmatrix}
  \begin{bmatrix} {\bf 0} & {\bf F}^T \\ 
                  {\bf F} & {\bf R}   \end{bmatrix}^{-1}
  \begin{bmatrix} {\bf h}({\bf u})    \\ 
                  {\bf r}({\bf u})    \end{bmatrix} \label{eq:var}
\end{align}
where ${\bf r}({\bf u})$ is a vector containing the covariance between 
${\bf u}$ and each of the $n$ training points (defined by Eq.~\ref{eq:cov}),
${\bf R}$ is an $n \times n$ matrix containing the correlation between each
pair of training points,
${\bf g}$ is the vector of response outputs at each of the training points, and
${\bf F}$ is an $n \times q$ matrix with rows ${\bf h}({\bf u}_i)^T$ (the
trend function for training point $i$ containing $q$ terms; for a constant
trend $q\!=\!1$).
This form of the variance accounts for the uncertainty in the trend 
coefficients $\boldsymbol \beta$, but assumes that the parameters governing
the covariance function ($\sigma_Z^2$ and $\boldsymbol \theta$) have known 
values.

The parameters $\sigma_Z^2$ and ${\boldsymbol \theta}$ are determined through 
maximum likelihood estimation.
This involves taking the log of the probability of observing the response 
values ${\bf g}$ given the covariance matrix ${\bf R}$, which can be written 
as:~\cite{Sac89}
\begin{equation}
\log \left[ p({\bf g} | {\bf R}) \right] = 
  -\frac{1}{n} \log \lvert{\bf R}\rvert - \log(\hat{\sigma}_Z^2) 
  \label{eq:like}
\end{equation}
\noindent where $\lvert {\bf R} \rvert$ indicates the determinant of ${\bf R}$,
and $\hat{\sigma}_Z^2$ is the optimal value of the variance given an estimate
of $\boldsymbol \theta$ and is defined by:
\begin{equation}
\hat{\sigma}_Z^2 = \frac{1}{n}({\bf g}-{\bf F}{\boldsymbol \beta})^T
  {\bf R}^{-1}({\bf g}-{\bf F}{\boldsymbol \beta})
\end{equation}
%\noindent where $\hat{\boldsymbol \beta}$ is the generalized least squares 
%estimate of $\boldsymbol \beta$ from:
%\begin{equation}
%\hat{\boldsymbol \beta} = \left[ {\bf F}^T{\bf R}^{-1}{\bf F} \right]^{-1}
%  {\bf F}^T{\bf R}^{-1}{\bf g}
%\end{equation}
\noindent Maximizing Eq.~\ref{eq:like} gives the maximum likelihood estimate 
of $\boldsymbol \theta$, which in turn defines $\sigma_Z^2$.

\subsubsection{Expected Improvement Function}\label{uq:reliability:global:ego:eif}

The expected improvement function is used to select the location at which a 
new training point should be added.
The EIF is defined as the expectation that any point in the search
space will provide a better solution than the current best solution
based on the expected values and variances predicted by the GP model.
An important feature of the EIF is that it provides a balance between 
exploiting areas of the design space where good solutions have been found, and 
exploring areas of the design space where the uncertainty is high.
First, recognize that at any point in the design space, the GP prediction
$\hat{G}()$ is a Gaussian distribution:
\begin{equation}
\hat{G}({\bf u}) \sim N\left[ \mu_G({\bf u}), \sigma_G({\bf u}) \right]
\end{equation}
\noindent where the mean $\mu_G()$ and the variance $\sigma_G^2()$ were 
defined in Eqs.~\ref{eq:exp} and \ref{eq:var}, respectively.
The EIF is defined as:~\cite{Jon98}
\begin{equation}
EI\bigl( \hat{G}({\bf u}) \bigr) \equiv 
  E\left[ \max \left( G({\bf u}^*) - \hat{G}({\bf u}),0 \right) \right]  
\end{equation}
\noindent where $G({\bf u}^*)$ is the current best solution chosen from among 
the true function values at the training points (henceforth referred to as
simply $G^*$).
This expectation can then be computed by integrating over the distribution 
$\hat{G}({\bf u})$ with $G^*$ held constant:
\begin{equation}
EI\bigl( \hat{G}({\bf u}) \bigr) = 
  \int_{-\infty}^{G^*} \left( G^* - G \right) \, \hat{G}({\bf u}) \; dG  
  \label{eq:eif_int}
\end{equation}
\noindent where $G$ is a realization of $\hat{G}$.
This integral can be expressed analytically as:~\cite{Jon98}
\begin{equation}
EI\bigl( \hat{G}({\bf u}) \bigr) = \left( G^* - \mu_G \right) \,
  \Phi\left( \frac{G^* - \mu_G}{\sigma_G} \right) + \sigma_G \,
  \phi\left( \frac{G^* - \mu_G}{\sigma_G} \right) \label{eq:eif}
\end{equation}
\noindent where it is understood that $\mu_G$ and $\sigma_G$ are functions of 
${\bf u}$.

The point at which the EIF is maximized is selected as an additional training 
point.
With the new training point added, a new GP model is built and then used 
to construct another EIF, which is then used to choose another new training 
point, and so on, until the value of the EIF at its maximized point is below 
some specified tolerance.
In Ref.~\cite{Hua06} this maximization is performed using a Nelder-Mead
simplex approach, which is a local optimization method.
Because the EIF is often highly multimodal~\cite{Jon98} it is expected that 
Nelder-Mead may fail to converge to the true global optimum.
In Ref.~\cite{Jon98}, a branch-and-bound technique for maximizing the EIF
is used, but was found to often be too expensive to run to convergence.
In Dakota, an implementation of the DIRECT global optimization algorithm is 
used~\cite{Gab01}.

It is important to understand how the use of this EIF leads to optimal
solutions.
Eq.~\ref{eq:eif} indicates how much the objective function value at ${\bf x}$ 
is expected to be less than the predicted value at the current best solution. 
Because the GP model provides a Gaussian distribution at each predicted 
point, expectations can be calculated.
Points with good expected values and even a small variance will
have a significant expectation of producing a better solution (exploitation), 
but so will points that have relatively poor expected values and greater 
variance (exploration).

The application of EGO to reliability analysis, however, is made more 
complicated due to the inclusion of equality constraints 
(see Eqs.~\ref{eq:ria_opt}-\ref{eq:pma_opt}).
For inverse reliability analysis, this extra complication is small.
The response being modeled by the GP is the objective function of the optimization 
problem (see Eq.~\ref{eq:pma_opt}) and the deterministic constraint might be handled 
through the use of a merit function, thereby allowing EGO to solve this 
equality-constrained optimization problem.
Here the problem lies in the interpretation of the constraint for multimodal
problems as mentioned previously.
In the forward reliability case, the response function appears in the
constraint rather than the objective.
Here, the maximization of the EIF is inappropriate because feasibility is
the main concern.
This application is therefore a significant departure from the original
objective of EGO and requires a new formulation.
For this problem, the expected feasibility function is introduced.

\subsubsection{Expected Feasibility Function}\label{uq:reliability:global:ego:eff}

The expected improvement function provides an indication of how much the true
value of the response at a point can be expected to be less than the current 
best solution.
It therefore makes little sense to apply this to the forward reliability problem 
where the goal is not to minimize the response, but rather to find where it is 
equal to a specified threshold value.
The expected feasibility function (EFF) is introduced here to provide an 
indication of how well the true value of the response is expected to satisfy 
the equality constraint $G({\bf u})\!=\!\bar{z}$.
Inspired by the contour estimation work in~\cite{Ran08}, this 
expectation can be calculated in a similar fashion as Eq.~\ref{eq:eif_int}
by integrating over a region in the immediate vicinity of the threshold value 
$\bar{z}\pm\epsilon$:
\begin{equation}
EF\bigl( \hat{G}({\bf u}) \bigr) = 
  \int_{z-\epsilon}^{z+\epsilon} 
    \bigl[ \epsilon - | \bar{z}-G | \bigr] \, \hat{G}({\bf u}) \; dG
\end{equation}
\noindent where $G$ denotes a realization of the distribution $\hat{G}$, as
before.
Allowing $z^+$ and $z^-$ to denote $\bar{z}\pm\epsilon$, respectively, this 
integral can be expressed analytically as:
\begin{align}
EF\bigl( \hat{G}({\bf u}) \bigr) &= \left( \mu_G - \bar{z} \right)
           \left[ 2 \, \Phi\left( \frac{\bar{z} - \mu_G}{\sigma_G} \right) -
                       \Phi\left( \frac{  z^-   - \mu_G}{\sigma_G} \right) -
                       \Phi\left( \frac{  z^+   - \mu_G}{\sigma_G} \right) 
          \right] \notag \\ & \ \ \ \ \ \ \ \ - 
  \sigma_G \left[ 2 \, \phi\left( \frac{\bar{z} - \mu_G}{\sigma_G} \right) \, -
                       \phi\left( \frac{  z^-   - \mu_G}{\sigma_G} \right) \, -
                       \phi\left( \frac{  z^+   - \mu_G}{\sigma_G} \right) 
          \right] \notag \\ & \ \ \ \ \ \ \ \ + \ \ \,
  \epsilon \left[      \Phi\left( \frac{  z^+   - \mu_G}{\sigma_G} \right) -
                       \Phi\left( \frac{  z^-   - \mu_G}{\sigma_G} \right)
          \right] \label{eq:eff}
\end{align}
where $\epsilon$ is proportional to the standard deviation of the GP 
predictor ($\epsilon\propto\sigma_G$).
In this case, $z^-$, $z^+$, $\mu_G$, $\sigma_G$, and $\epsilon$ are all functions of the 
location ${\bf u}$, while $\bar{z}$ is a constant.
Note that the EFF provides the same balance between exploration and 
exploitation as is captured in the EIF.
Points where the expected value is close to the threshold 
($\mu_G\!\approx\!\bar{z}$) and points with a large uncertainty in the 
prediction will have large expected feasibility values.
